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Abstract — Consistent, continuous, and long time series of global 
biophysical variables derived from satellite data are required 
for global change research. A novel climatology fitting approach 
called CACAO (Consistent Adjustment of the Climatology to Ac- 
tual Observations) is proposed to reduce noise and fill gaps in time 
series by scaling and shifting the seasonal climatological patterns 
to the actual observations. The shift and scale CACAO parameters 
adjusted for each season allow quantifying shifts in the timing 
of seasonal phenology and inter-annual variations in magnitude 
as compared to the average climatology. CACAO was assessed 
first over simulated daily Leaf Area Index (LAI) time series with 
varying fractions of missing data and noise. Then, performances 
were analyzed over actual satellite LAI products derived from 
AVHRR Long-Term Data Record for the 1981-2000 period over 
the BELMANIP2 globally representative sample of sites. Compar- 
ison with two widely used temporal filtering methods — the asym- 
metric Gaussian (AG) model and the Savitzky-Golay (SG) filter 
as implemented in TIMESAT — revealed that CACAO achieved 
better performances for smoothing AVHRR time series charac- 
terized by high level of noise and frequent missing observations. 
The resulting smoothed time series captures well the vegetation 
dynamics and shows no gaps as compared to the 50-60% of still 
missing data after AG or SG reconstructions. Results of simulation 
experiments as well as confrontation with actual AVHRR time 
series indicate that the proposed CACAO method is more robust 
to noise and missing data than AG and SG methods for phenology 
extraction. 

Index Terms — Advanced Very High Resolution Radiometer 
(AVHRR), climatology fitting, gap filling, inter-annual anomalies, 
leaf area index (LAI), phenology, temporal smoothing. 

I. INTRODUCTION 

L EAF area index (LAI) is recognized as an Essential Cli- 
mate Variable [44] that plays a key role in a number 
of processes [12]. Global LAI products are routinely pro- 
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duced from various moderate spatial resolution sensors such as 
VEGETATION [3], [6], Moderate Imaging Spectroradiometer 
(MODIS) [28], or the Advanced Very High Resolution Ra- 
diometer (AVHRR) [17]. However, the current LAI products 
are spatially and temporally discontinuous mainly due to cloud 
occurrence, residual atmospheric or directional effects, snow 
cover, retrieval algorithm failure or sensor problems which limit 
their use for the monitoring of vegetation dynamics. To better 
describe and understand vegetation dynamics in response to 
climate fluctuations and trends (e.g. temperature, rainfall, solar 
radiation) and/or anthropogenic forcing (e.g., ground water 
extraction, farming decisions, change in land use) [9], [10], 
[29], [31], consistent and continuous long time series of global 
LAI products are thus required. Variations in magnitude and 
shifts in the timing of vegetation phenology are key indica- 
tors for global change issues [41]. However, extracting the 
associated metrics to quantify trends, anomalies, or changes 
from the satellite time series is not straightforward: the noise 
in the data and missing observations may induce significant 
uncertainties in the estimation of these metrics [23]. The lit- 
erature shows a broad variety of strategies designed to reduce 
noise and fill gaps in time series including the widely used 
asymmetric Gaussian (AG) [23], Savitzky-Golay (SG) filter 
[13], [37], double logistic function [20], [25], Fourier analysis 
[9], and wavelet decomposition [33]; as well as to extract 
phenological metrics and monitor vegetation dynamics based 
on thresholds [29], [42], moving averages [32], empirical equa- 
tions [27], conceptual-mathematical phenology models [15], 
first derivatives and piecewise logistic functions [43], spectral- 
frequency decomposition techniques [9], [33], [34], and curve 
fitting [23], [25]. The choice of the smoothing gap filling or 
compositing method may have a large impact on the accuracy 
of the extraction of phenology indicators since their ability 
to preserve the integrity (magnitude and shape) of the overall 
time series may vary [1], [21]. Recently, [41] compared several 
methods for extracting phenological timing and found some 
large discrepancies, up to ±60 days in the detection of start 
of the season. Most of these time series analysis have been 
carried out using the normalized difference vegetation index 
(NDVI) which is a proxy of vegetation biophysical variables. 
However, LAI offers the advantage to be more sensitive to 
the larger vegetation amount as compared to NDVI or other 
vegetation characteristics such as the Fraction of Absorbed 
Photosynthetically Active Radiation [2], [30]. 

A novel approach is proposed here to smooth and fill gaps 
in LAI long time series derived from AVHRR observations at 
0.05° spatial resolution [39]. The method is based on the typical 
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Fig. 1. Flow chart of the algorithm for deriving consistent and continuous long-term daily LAI data set as well as quantifying temporal anomalies. 


seasonal pattern of each pixel, i.e., the pixel phenology model. 
It is used as a background information to fill gaps and smooth 
the time series of daily LAI estimates derived by learning 
GEOV 1 LAI products [5]. The phenology model is based on the 
climatology of LAI values computed as the average value for a 
given date in the year across all years of the time series. For 
each pixel, the corresponding phenology model is adjusted to 
each season of the time series by shifting its time reference and 
scaling its magnitude. As opposed to several other smoothing 
and gap filling methods listed previously, the phenology model 
derives only from the satellite observations themselves. The 
resulting smoothed and gap filled time series provides the shift 
and scale parameters associated to each season as subproducts. 
They can be used to quantify inter-annual deviations in vegeta- 
tion status from the average pattern often called “anomalies.” 
Because the same climatological phenology model is used 
across all the years, the method applies strictly to quantify 
phenological changes, i.e., change in the parameters of the 
phenology model, rather than a change of the phenology model 
[16] corresponding to a change generally associated to abrupt 
disturbances [26], [35]. 

This study presents the principles of the proposed method. 
It is then evaluated against other methods, focusing on the 
capacity to smooth the daily LAI estimates and to fill the gaps. 
Finally, the interest of the resulting shift and scale parameters 
to quantify anomalies is evaluated. 

II. Data and Methods 

Global long-term LAI time series derived from AVHRR 
in the 1981-2000 period were used to assess the proposed 
approach under a variety of conditions. The analysis was con- 
ducted over the 445 BELMANIP2 (BEnchmark Land Multi- 
site Analysis and Inter-comparison of Products) sites that are 
supposed to represent the possible range of surface types and 
conditions over the Earth [Fig. 5(b)] [4]. The LAI data set is 


first presented, followed by a description of the main steps for 
the implementation of the proposed method (Fig. 1) as well 
as the AG and SG companion methods from TIMES AT. Then, 
the simulation experiment designed for assessing the methods 
under varying fraction of missing data and noise level and the 
metrics are described. 

A. Generation of a Long-Term LAI Data set 

The derivation of the long-term daily LAI data set is based on 
the previous work of [36] demonstrating that neural networks 
could be trained to consistently estimate a given product from 
the reflectance measured by another sensor providing that a 
strong link exists between the inputs (radiometric signal) and 
outputs (the products). This principle is applied here to mimic 
GEOV1/VGT dekadal LAI product [6] from historical AVHRR 
archive data. 

AVHRR Long-Term Data Record (LTDR, Version 3) [45] 
provides global coverage at 0.05° (5.6 km at equator) sampling 
interval in a latitude/longitude climate modeling grid and at a 
daily temporal step for the 1981-2000 period. LTDR top of the 
canopy normalized reflectances (nadir, sun at 45°) result from 
the reprocessing of Global Area Coverage data set by apply- 
ing the preprocessing improvements identified in the AVHRR 
Pathfinder II and MODIS projects for radiometric calibration, 
geometric correction, cloud screening, and corrections of the 
atmospheric and directional effects [38], [39]. 

GEOV1/VGT LAI product, available at [46], provides global 
coverage for the period 1999-2010 at 1 km spatial resolution 
and dekadal time step [6]. Recent validation studies showed the 
GEOV 1 products to outperform the currently available products 
both in terms of accuracy and precision [11]. 

A back-propagation neural network with a relatively simple 
architecture made of one hidden layer of five tangent sigmoid 
neurons and one output layer with one linear neuron was 
considered based on the previous findings of [36]. Red and 
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Fig. 2. Illustration of CACAO implementation over the cropland site #128 (Fig. 5(b); Lat = —23.48°, Lon = 28.20°). (a) Climatology computation. Dots 
indicate the accumulated LAI daily data over the period 1981-2000. The bold line corresponds to the climatology computed as the median of inter-annual values 
over a 30-days compositing window at a dekadal step. The several gray values correspond to 50% (dark gray), 95% (medium gray) and 99% (light gray) of 
the population of values for a given date, (b) CACAO fitting for the period 1996-2000. Circles correspond to raw LAI data. The dotted line corresponds to the 
climatology and the continuous line corresponds to the CACAO time series resulting from adjusting the climatology to the data at a daily step. The plus symbols 
indicate the position of the minima and maxima in the climatology and the segments indicate the duration of the sub-seasons for the fitting. 


near-infrared AVHRR LTDR surface reflectance normalized at 
nadir viewing for a 45° sun zenith angle were used as inputs. 
The network was trained over the BELMANIP2 sites in the 
1999-2000 period in which LTDR and GEOV1 products are 
coexisting. Differences in spatial resolution between AVHRR 
LTDR (0.05°) and GEOV1/VGT (1/112°) products were taken 
into account by averaging the GEOV1/VGT products over 
3x3 km 2 . Residual differences due to spatial resolution and 
geometry uncertainties are expected to be small because the 
BELMANIP2 sites were selected to be relatively homogeneous 
at the medium spatial resolution [4]. The dekadal GEOV1/VGT 
product was linearly interpolated at the dates for which LT- 
DRV3 reflectances were available since the objective is to 
provide daily LAI products. This interpolation benefits from 
the smooth character of GEOV1/VGT temporal profiles [11]. 
To account for possible spectral sensitivity differences between 
the several AHVRR-XX sensors available over the 1981-1998 
period and AVHRR- 14 used in 1999-2000 for the neural net- 
work training, AVHRR-XX reflectances in the red and near 
infrared were corrected to mimic the AVHRR- 14 ones. The 
correction was achieved by computing a scaling factor between 
the reflectance of AVHRR-XX with that of AVHRR-14. The 
correction factor was adjusted over radiative transfer model 
simulations using the PROSAIL radiative transfer model [22] 
and the specific spectral responses of the several AVHRR-XX 
sensors. A flowchart of the retrieval algorithm is presented in 
Fig. 1. Further information about the generation of the LAI data 
set is provided in [7] . 


B. Consistent Adjustment of the Climatology to 
Actual Observations 

A climatology defined as the inter-annual median of the daily 
products available within a 30-days compositing window (±15 
days) was generated at a dekadal temporal step (a 10-days 
period; there are thus 36 dekads over one year) and at the pixel 
scale (0.05° spatial resolution). It corresponds to the phenology 
model used later for smoothing and gap filling. The clima- 
tology value was computed for each dekad if a minimum of 
five observations over the 30-days compositing window exist. 
Climatology computation is illustrated in Fig. 2(a). The whole 
period (1981-2000) was used to minimize the probability of 
finding a gap. The resulting dekadal pixel phenology model was 
generally showing very smooth profiles [Fig. 2(a)]. It was thus 


linearly interpolated to provide a daily climatology, LAI clim , 
consistent with the daily time step of the daily LAI products 
derived from AVHRR observations. 

A Consistent Adjustment of the Climatology to Actual Ob- 
servations (CACAO) was then performed by shifting (shift) and 
scaling (scale) the phenology model in order to minimize the 

root mean square error ( RMSE ), between the actual daily LAI 

/ x — CACAO 

estimates, LAI(t ), and the CACAO estimates LAI = 

scale. LAI clim (t ± shift) over portions of the seasonal cy- 
cle called “sub-seasons”: 


RMSE = 


N 


1 x — / /N C AC AO \ ^ 

- X (LAKti) - LAI ( u )) 


( 1 ) 


where t is the time in days and n is the number of available dates 
of observations during the sub-season. If the actual observations 
follow the phenology model, LAI clim (t), the scale factor 
and the temporal shift parameter are by definition scale = 1 
and shift = 0. The cost function was evaluated for 121 values 
of the shift (—60 days < shift < 60 days with steps of 1 
day), the scale being adjusted using a linear regression with 
no intercept. However, difficulties to fit the shift parameter are 
expected when a small number of observations are available 
in the sub- season or if the data present a limited seasonality. 
Therefore, CACAO is applied if a minimum of n = 10 ob- 
servations representing at least 30% of the LAI climatology 
amplitude (difference between minima and maxima in the cli- 
matology values) within the sub-season is available. Otherwise, 
the climatology is used as a backup solution. 

A sub- season is defined as the period between two consecu- 
tive minimum and maximum in the LAI climatology. To allow 
more robust fitting by providing better handle on temporal 
features and smoother transitions between sub- seasons, the sub- 
season period used to adjust the phenology model is slightly 
extended. This extended period added before (respectively, 
after) the sub- season should contain either 30% of the season 
amplitude or 30% of the period length (in days) of the previous 
(resp. next) sub-season. Over the transition periods where CA- 
CAO fitting extended sub- seasons overlap, a weighted average 
- CACAO 

of LAI values is finally computed to get continuous and 

smooth temporal profiles. CACAO fitting process is illustrated 
in Fig. 2(b) with each sub-season used to adjust the phenology 
model displayed. 
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Similar to other methods [19], [33] based on the fitting of a 
phenology model, CACAO implicitly assumes that the vegeta- 
tion species composition and state did not dramatically change 
so that a single phenology model can be used over the whole 
20-years period. CACAO is therefore expected to encounter 
difficulties in case of disturbances such as landcover or land- 
use change, fire or flood events covering a large fraction of the 
pixel. However, the sub- season fitting approach provides some 
capacity to adapt to reasonable deviations from the original 
shape of the phenology model. Further, disturbances occur 
mainly at finer spatial scale as compared to the 0.05° coarse 
spatial resolution considered here. They are therefore expected 
to impact only occasionally the shape of the phenology model. 
Difficulties are also expected when artifacts in phenology 
model may appear due to a small number and/or very noisy 
observations are observed during the period used to compute 
the climatology. However, the daily temporal sampling of 
AVHRR observations, the 30-days compositing window, and 
the 20-years period used to compute the climatology decrease 
the probability to get strong artifacts in the phenology model. 
Nevertheless, the 30-days compositing window and the averag- 
ing over the 20 years may flatten particular features in the phe- 
nology model such as rapid LAI changes shifted significantly 
across years and sub-seasonal anomalies mostly linked with 
unusual temperature or rainfall events. CACAO further assumes 
that the actual LAI as a function of time is only proportional to 
the shifted climatological LAI value without considering any 
possible LAI offset. This approximation is reasonable since 
the minimum LAI values are generally relatively small and 
stable in absolute term. In all these situations, the RMSE value 
will provide a quantitative quality assessment indicator of the 
consistency between the phenology model and actual daily ob- 
servations. Finally, although CACAO is able to provide realistic 
temporal profiles in case of marginal seasonality observed over 
bare areas or evergreen forests, it is expected to be more difficult 
to fit the shift parameter in a robust way. Therefore, the time 
shift parameter should not be interpreted in such conditions. 

C. AG and SG TIMESAT Methods 


AVHRR time series characterized by more than 70% of missing 
data (Section III-A3), a simple gap filling method was applied: 
first, an iterative linear interpolation was used to fill gaps 
smaller than 120 days [37]; second, a temporal compositing 
was applied at 10-day step through a Gaussian function with 
a 30-day compositing period. 

In addition to the time series reconstructions, TIMESAT 
toolbox allows to compute some phenological metrics including 
the season amplitude (i.e., difference between the maximal 
value and the base level), the start of season (hereafter referred 
as SoS) and the end of season (hereafter referred as EoS). The 
time of the SoS (EoS) is defined as the time for which the left 
(right) edge reached 20% of the seasonal amplitude measured 
from the left (right) minimum level. 

D. Simulation Experiment 

A simulation experiment was conducted to evaluate the 
performances of the several methods in presence of noise and 
missing observations as already proposed by [34], [37], [40]. 
For this purpose, the complete and smooth time series resulting 
from the median of CACAO, AG, and SG reconstructed profiles 
for site #338 showing a double season (Fig. 4) was first consid- 
ered as a reference, LAI re f. Note that only few observations 
are missing over site #338, resulting in a relatively good con- 
sistency between the three methods investigated. Then, missing 
observations were randomly distributed, and several levels of 
white noise were introduced in the reference LAI time series to 
simulate the actual daily LAI data, LAI day The noise compo- 
nent was generated using a normal distribution N(0, a) (mean 
value equal to 0 and variance, a 2 ), i.e., LAId ay = LAI re f + 
N(0, a). The values for the absolute LAI uncertainty used in this 
study were varying between cr = 0 and a = 0.5 by 0.05 steps. 
The fraction of missing data was ranging between 0 and 0.85; 
few tests showed that for fractions of missing data larger than 
0.85, all the methods were unreliable and AG and SG failed 
in most of cases which prevents deriving meaningful statistics. 
Finally, the CACAO, AG, and SG methods were applied to the 
simulated time series with variable fraction of missing data and 
level of uncertainties. 


For comparison purposes, CACAO was compared with two 
widely used temporal filtering methods: AG and adaptive SG 
as implemented in the TIMESAT toolbox for analyzing time 
series of satellite observations [24], [47]. Double logistic func- 
tion produces very similar results as AG while being more 
sensitive to discontinuities in the data in agreement with [18]. 
TIMESAT has been successfully applied to extract phenology 
from AVHRR time series and was found to outperform Fourier 
decomposition methods that were experiencing difficulties in 
presence of noise and missing data such as in the AVHRR series 
[23]. The adaptive SG-filtering method uses local second-order 
polynomial functions in fitting three observations at each side of 
the date being processed. AG method uses a Gaussian function 
that is fitted to data around maxima and minima in the time 
series. 

AG and SG methods failed in processing a significant part 
of the daily AVHRR time series characterized by many gaps. 
Indeed, data cannot be processed if there is a missing time 
period longer than 0.2 years or when more than 25% of data 
are missing over the entire time series. To use AG and SG over 


E. Metrics 


The reconstructed time series were then compared with 
the original reference data and the corresponding RMSE re f 
computed and used as an indicator of performances: 


RMSE re f = 



• v 2 

E - LAi ref (tiYj 


( 2 ) 


where LAI (t) is the estimated LAI value at date t, LAI re f (t) 
is the reference LAI data, and n is the number of dates in the 
time series for which the reconstructions of the three compared 
methods are available. In order to obtain representative values, 
the RMSE re f was derived from 50 iterations for each level of 
noise and gap fraction. The closeness to the actual daily raw 
LAI data is also evaluated using RM SE raw 


RM S E raw 


N 


1 

n 


E 


- LAI r 



(3) 
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Similar metrics were used for the evaluation of the pheno- 
logical parameters (P) extracted from CACAO and TIMESAT 
methods. RMSE^ e ^ is defined as 


RMSEf ef = 



it 2 

J2(m)-Pref(ti)) (4) 

i= 1 




(c) □ 

CACAO 

*■ 

AG 

O 

SG 


n ? 

i 8 S 

ggS 8§0 


°0 0.2 0.4 0.6 0.8 

Fraction of missing data 


where P(t) is the estimated phenological P parameter at date 
t, P re f (t) is the corresponding P parameter as computed from 
the reference time series, and n is the number of available 
dates used in the comparison. When the methods are applied 
to actual AVHRR time series, a reference is not available, 
and the deviation of P parameters from their mean value (the 
climatological one) is considered in this case for computing the 
RMSE, p 


RMSE r = 


N 


1 m 1 n 

iz- n z 


rn z ' n 
j = i i = i 


( 5 ) 


where Pj(ti) is the P value at date t for the time series j, Pj is 
the mean value of P along the time series j, n is the number 
of dates corresponding to the phenological P stage in the time 
series, and m the number of time series. 


III. Results 

The performances of the CACAO method was evaluated 
against TIMESAT AG and SG methods for A) improving the 
consistency and continuity of the time series and B) for the 
characterization of inter-annual anomalies through time shifts 
of phenological stages and magnitude (scale) of LAI values. 
The assessment was performed over the BELMANIP2 sites 
across the main biome classes based on the GLOBCOVER land 
cover map [8]. 

A. Performances of CACAO for Improving the Consistency 
and Continuity of Time Series 

The theoretical performances of CACAO, AG, and SG are 
first presented. Then, typical temporal profiles are inspected. 
Finally, the continuity and consistency are evaluated over the 
entire BELMANIP2 sites. 

1 ) Theoretical Performances as a Function of Fraction of 
Missing Data and Daily LAI Uncertainties: The theoretical 
performances were evaluated over the simulated data using the 
RMSE re f value computed with the reference data. Results 
show that for low to medium values of fraction of missing 
data, the performances are mainly depending on the daily LAI 
uncertainties [Fig. 3(a)]. For the larger fraction of missing data, 
CACAO performances are strongly degrading with the fraction 
of missing data and the daily LAI uncertainties. 

The performances of SG and AG methods as a function of 
the fraction of missing data and the daily LAI uncertainties 
show similar patterns as those of CACAO (figures not shown 
for the sake of brevity). However, Fig. 3(b) demonstrates that 
the CACAO method outperforms AG and SG methods in all 
the situations, particularly for the higher levels of the fraction 
of missing data and daily LAI uncertainties. AG gets slightly 
lower RMSE re f values as compared to SG. 


Fig. 3. (a) Evaluation of theoretical performances of CACAO in terms of 

RMSE re f as a function of the fraction of missing data varying between 0 and 
0.85 and the a noise level ( LAI no i sy = LAI + K(0, cr)) varying between 0 
and 0.5. (b) Comparison of the RMSE re f of the two TIMESAT methods 
with the RMSEref of CACAO as evaluated over the reference data, (c) 
RMSEraw of CACAO, AG, and SG over the raw data as a function of the 
fraction of missing data. 

Although the CACAO reconstructed LAI time series is the 
closest to the reference LAI values, it is the one that allows more 
departure from actual daily LAI values (higher RMSE raw 
values), particularly when the daily LAI data are contaminated 
with large uncertainties [Fig. 3(c)]. The use of the climatology 
as background information provides robustness gap filling and 
smoothing processes. 

2 ) Inspection of Typical Temporal Profiles: The main fea- 
tures associated to the temporal consistency of CACAO, AG 
and SG are illustrated over a selection of six sites representing 
different conditions with the location indicated in Fig. 5(b). 
The sites were also selected so that AG and SG methods 
were not failing. In most of the cases, a good agreement is 
found between the three methods that similarly fit the raw 
daily LAI data when few observations are missing (Fig. 4, site 
#338). However, the reliability of the retrieved temporal profile 
depends on the level of noise and the presence of gaps in the 
data as demonstrated earlier with the simulated cases. Noisy 
daily LAI data are partially filtered out for AG and SG methods 
as implemented in TIMESAT where single spikes are removed 
based on the distance to neighbor data. This approach may be 
useful to eliminate possible high temporal frequency residual 
artifacts but appears insufficient to process AVHRR series in 
case of significant uncertainty associated to the daily LAI data 
(e.g., Fig. 4, site #436) and repetitive occurrence of consecutive 
outliers (year 1988 in site #325). SG seems to be the most 
sensitive to accidents in the data: Fig. 4, site #436 which 
is expected having a minimum seasonality; site #395, winter 
time for years 1982, 1984, and 1990; or site #325, year 1990 
during the maximum development of vegetation. Conversely, 
AG reduces most of the residual noise. The assumption made 
in AG about the shape of the seasonal phenology development 
helps reducing the effect of noise in the data. However, AG 
may present some limitations to reproduce abrupt variations in 
LAI, related to the emergence, greening up, wilting, or harvest- 
ing of the crops within a short growing season or secondary 
sub- seasons as for site #69 in Fig. 4. In contrast, SG is not 
explicitly tied to a specific shape regarding phenology devel- 
opment and thus better adapt to local characteristics of the 
observed signal (e.g., double seasonality in site #69). Similar 
to AG, CACAO appears to be robust to noise. However, while 
the phenological shape function is fixed in AG, CACAO is 
able to adapt it from pixel to pixel thanks to the corresponding 
phenology model derived from the climatology, which provides 
additional flexibility. Nevertheless, some problems still remain 
in the CACAO approach: scaling and shifting the climatology 
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# 436 Evergreen Broad leaf Foresl La1= 3.05° Lon= -69.84° 



#396 Deciduous Broadieaf Foresl La1= 52.01° Lon= 87.63° 





82 84 86 88 90 92 


# 279 Shrub Savana Bare Lat= -30.62° Lon= 129.52° 



Fig. 4. Temporal profiles of raw LAI data (black circles), CACAO (red line), 
AG (green line), and SG (blue line) LAI reconstructions for six BELMANIP2 
sites from 1982 to 1992. The # number of each site refers to Fig. 5(b). The 
GLOBCOVER biome class [8] and the latitude and longitude are also indicated. 
(For interpretation of the references to color in this figure legend, the reader is 
referred to the web version of this article.) 

may be inappropriate to depict local rapid changes (e.g., local 
minima in site #69) or sub- seasons that strongly differ from the 
average seasonality (e.g., Fig. 4, site #279, year 1983). 

The temporal continuity in the AVHRR data is generally poor 
in the equator (e.g., Fig. 4, sites #436) due to the persistence 
of clouds and at high latitudes in winter time (e.g., Fig. 4, 
site #395) mainly due to snow and cloud cover as well as 
poor illumination conditions due to large sun zenith angles. 
CACAO approach appears very useful to fill gaps in the time 
series and may overcome some of the limitations of AG and 
SG to deal with gaps (e.g., Fig. 4, site #325, years 1982-1983). 
Note that when there is no enough data over the entire sub- 
season for fitting the model, the climatology is assumed as 
the most probable solution (e.g., Fig. 4, site #436, years 1982, 
1987-1989, 1990-1991; site #325, year 1983) which may be 
not realistic for very anomalous years. 

3) Continuity of the Time Series: The total fraction of 
missing AVHRR daily observations is around 73% over the 
445 BELMANIP2 sites in the period 1981-2000. It mainly 
occurs along the equatorial zone because of high probability 
of cloud coverage and in winter (respectively summer) in the 
northern (respectively southern) higher latitudes because of the 
poor illumination conditions [Fig. 5(a)]. This demonstrates that 
the continuity of time series of the raw AVHRR observations 


may be very poor in many situations. The CACAO method 
allows filling all the missing data, even when there is not 
enough data in a sub- season to achieve a good fit of the 
phenology model. In this case, the phenology model derived 
from the climatology is used. This happens in about 23% of 
cases (pixels x dates). Note that there was always enough data 
to compute the climatology of each of the 445 BELMANIP2 
sites over the 20-years period and the 30-days local compositing 
window. Cases with missing climatology are therefore expected 
to be very limited over the globe. 

Conversely, AG and SG methods were failing over, respec- 
tively, 238 (53% of the sites) and 209 (47%) sites where there 
were not enough high-quality observations to fit a curve over a 
given local time period. These sites were mostly located at the 
equator and at high latitudes where there are long periods with 
missing data [black filled circles in Fig. 5(b)]. AG fails in more 
cases than SG because AG requires a minimal seasonality in the 
time series to fit the AG model: 29 desert sites are not processed 
[gray-filled circles in Fig. 5(b)]. 

4) Temporal Consistency: Because there are generally no 
reference LAI values for most of the sites, the temporal con- 
sistency is evaluated based on the closeness to actual daily data 
(. RMSE raw ) and the smoothness of the temporal profiles. The 
spatio-temporal distribution of RMSE raw [Fig. 6(a)] shows 
obvious patterns, with higher uncertainties in winter for the 
higher northern latitudes and around the equator. The higher 
RMSE raw values correspond to the more difficult observa- 
tional conditions over regions/periods with permanent snow 
and cloud cover [cf. Fig. 5(a) and Fig. 6(a)]. The relationship 
between closeness to daily observations ( RMSE raw ) shows 
that all the three methods agree well for fraction of missing data 
lower than 0.5 [Fig. 6(b)]. For the larger fraction of missing 
data, CACAO and AG show an increasing departure from the 
daily observations as compared to SG. This was already noticed 
with the simulation experiment where CACAO showed larger 
RMSE raw as compared to AG and SG [Fig. 3(c)], but smaller 
RMSE re f [Fig. 3(b)], particularly in case of high fraction of 
missing data and high level of uncertainties associated to the 
daily LAI data. The smoothness of the temporal profiles derived 
from CACAO confirms the good temporal consistency of the 
CACAO method. 

Smooth temporal profiles are expected since leaf area dy- 
namics results from incremental bio-physical processes ex- 
cept under sudden disturbance. The smoothness level of LAI 
temporal series was evaluated using the difference, SLAI be- 
tween LAI (t) product value at date t and the mean value be- 
tween the two bracketing dates: 6 LAI = 1/2 {LAI(t + St) + 
LAI(t — St)) — LAI(t), where St is the 10-days temporal 
sampling interval ([37]). Difference SLAI is computed only 
if the two bracketing LAI values at (t — St) and (t + St) 
exist. The smoother the temporal evolution, the smaller the 
SLAI difference should be. The histogram of SLAI over the 
BELMANIP2 sites in the 1981-2000 period (Fig. 7) shows 
the effectiveness of the three methods to smooth the AVHRR 
raw data. SG method appears the most sensitive to noise in 
the data conversely to AG method that provides the smoothest 
profiles since it benefits from fitting data to a single and smooth 
phenology model. CACAO constitutes an intermediate solution 
between SG and AG in terms of smoothness. It is very robust 
to accidents in the data since the climatology plays a major 
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Fig. 5. (a) Fraction of missing data for AVHRR daily products as a function of the latitude and the date of acquisition in 10° x 1-month cells. Evaluation over 

the BELMANIP2 sites during the 1981-2000 period, (b) Location of the BELMANIP2 sites where (i) CACAO and SG and AG TIMES AT methods successfully 
processed the AVHRR time series (unfilled circles), (ii) CACAO and SG success and AG fails (desert sites) (gray-filled circles) and (ii) only CACAO success 
(black filled circles). The numbers refer to the sites in Fig. 2 and Fig. 4. 
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Fig. 6. (a) RMSE r aw between CACAO and raw LAI data as a function of the latitude (10° steps) and the date of acquisition (monthly step), (b) RMSE raw 

of CACAO, AG, and SG over the raw data as a function of the fraction of missing data. Evaluation over the 445 BELMANIP2 sites during the 1981-2000 period. 



5 LAI 

Fig. 7. Histogram of 5LAI absolute difference representing temporal 
smoothness. 

regularization role while allowing fitting local variations at the 
same time. 

B. Use of CACAO to Characterize Inter-Annual Anomalies 

The estimated shift and scale parameters from CACAO 
provide indicators of phenological changes. For comparison 
purposes, the phenological parameters derived from AG and SG 
as implemented in TIMESAT for each of the full seasons were 
used. For AG and SG methods, the lag between the time of start 
or end of each season and the corresponding average date across 


all seasons was compared with the CACAO shift. The variation 
of the season amplitude parameter for AG and SG methods 
was divided by the corresponding average amplitude across all 
seasons and was compared to the CACAO scale factor. 

The simulation experiment was first completed to better 
assess the theoretical performances of the CACAO, AG, and 
SG to accurately date the main phenological stages. Then, 
performances will be analyzed over a larger set of actual sites. 

1) Theoretical Performances for Phenology : The 
RMSE^j computed over the simulated data as compared 
to the reference data was used to score the performances of 
CACAO, AG, and SG methods. Results (Fig. 8) show that 
the performances are highly depending on the daily LAI 
uncertainties and the fraction of missing data with similar 
tendencies as in Fig. 3(a). The CACAO method outperforms 
AG and SG methods in all the situations for SoS [Fig. 8(a)] and 
EoS [Fig. 8(b)]. AG provides lower RMSE^ e j values than 
SG. Conversely, only small differences are observed between 
the three methods for the amplitude [Fig. 8(c)]. 

2) Quantifying Phenological Changes in the Time Series: 
The comparison of phenological shifts and scales between the 
three methods as computed over the actual time series across 
the 445 BELMANIP2 sites shows a generally good agreement 
between all the methods with unbiased residuals (Fig. 9). 

However, the shift and scale parameters derived fromthe 
CACAO method appear more stable: lower RMSE^ aw values 
are observed (Fig. 10), quantifying the deviation between each 



1970 


IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 51, NO. 4, APRIL 2013 


Amplitude 



Fig. 8. Theoretical performances ( RMSE of the phenology extractions 
for the (a) start of season (SoS), (b) end of season (EoS), and (c) amplitude of 
the simulated LAI time series as derived from CACAO, AG, and SG methods 
as a function of the fraction of missing data. 
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Fig. 9. Histogram of differences of (a) the shift in the start of season (SoS) and 
end of season (EoS) and (b) the scale factor (variation in seasonal amplitude) 
representing temporal anomalies of LAI data to the climatological pattern as 
resulted from CACAO and TIMESAT AG and SG method adjustments over the 
BELMANIP2 sites for the period 1982-1992. 
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Fig. 10. RMSE^ aw of the phenology extractions for the (a) start of season 
(SoS), (b) end of season (EoS), and (c) amplitude of the actual AVHRR LAI 
time series as derived from CACAO, AG, and SG methods as a function of the 
fraction of missing data. 


season from the average value across all the seasons. Further, 
CACAO shift and scales appear only little sensitive to the 
fraction of missing data (Fig. 10). The same is observed for AG 
and SG SoS and EoS [Fig. 10(a) and (b)], while the amplitude 
derived from these two methods appears more sensitive to the 
fraction of missing data [Fig. 10(c)]. Although these results 
obtained on actual data do not constitute an undisputable proof 
of the performances of CACAO for quantifying inter-annual 
anomalies because of the lack of reference values, they con- 
tribute to consolidate the promising theoretical results. 

IV. Conclusion 

This paper introduces the CACAO method: a new climato- 
logical fitting approach for smoothing, gap filling, and quanti- 
fying vegetation anomalies in satellite time series. It is based 
on the fitting of a phenology model. This model is specific to 
each pixel and is derived from the climatology computed over 


the time series of the considered pixel. CACAO method appears 
to be a compromise between very flexible methods such as the 
adaptive SG filter and methods based on a unique phenology 
model such as the AG method. The performances of CACAO 
were evaluated by comparison with the widely used AG and SG 
methods as implemented in the TIMESAT toolbox. In terms of 
the required computer resources, CACAO is as demanding as 
AG, although SG is twice faster. 

The CACAO method was first applied to simulated time 
series of daily LAI estimates as derived from AVHRR observa- 
tions. This simulation experiment shows a significant improve- 
ment in the theoretical performances of LAI reconstructions 
and phenology extraction for CACAO as compared to SG and 
AG methods in all the situations, particularly for the higher lev- 
els of the fraction of missing data and daily LAI uncertainties. 

Results observed over actual AVHRR satellite data showed 
the potentials of the proposed CACAO method to capture 
the seasonality in the data while improving consistency and 
continuity of the time series. CACAO overcomes the difficulties 
of AG and SG methods to process the irregular nature of 
the AVHRR time series due to the large fraction of missing 
data (around 73%) and the high noise level associated. The 
AG and SG methods failed in processing, respectively, 238 
and 209 of 445 BELMANIP2 sites because not enough high 
quality data was available for fitting the function which re- 
sulted, respectively, in 60% and 47% of invalid products. In 
contrast, the CACAO method allowed filling all the gaps and 
in particular during long periods of missing data where the 
climatology computed across the 20-years period of available 
AVHRR observations was used. The assessment of the temporal 
consistency and the smoothness of LAI profiles revealed that 
CACAO constitutes an intermediate solution between AG and 
SG in terms of robustness and adaptability to local variations. 

The scale and shift parameters derived from CACAO allowed 
the quantification of inter-annual anomalies and showed a rel- 
atively good consistency with the seasonality extracted from 
AG or SG phenological parameters. However, the simulated 
experiments conducted in this study showed a better accuracy 
of CACAO for the dating of the start or the end of the season 
as compared to that derived from AG and SG methods. Nev- 
ertheless, further confrontations with climatic variables or phe- 
nological models [14] as well as validation with ground-based 
phenological observations should be conducted. This accuracy 
assessment should include the spatio-temporal performances of 
CACAO as compared with other existing methods including 
Fourier analysis and piecewise decomposition. 

However, the main limitation of CACAO reconstruction 
method is its inability to capture underlying atypical modes 
of seasonality including rapid natural and human induced dis- 
turbances in the LAI time series that strongly differ from the 
average climatology (e.g., flood or fire events, changes in the 
land cover). To prevent from such drawback, the resulting fitted 
climatology can be fused with a product closer to the actual LAI 
observations as the one resulting from the adaptive SG filtering. 
This fusion approach should overcome CACAO limitations 
when enough daily LAI observations are available. It will be 
implemented in future products to generate continuous long- 
term Earth System Data Records from remote sensing data 
collected with several sensors over the past three decades [7]. 
These algorithms will be adapted to the next generation of 
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sensors such as VIIRS, PROBA-V, and Sentinel 3. These long- 
term data records and the proposed climatology fitting approach 
are expected to contribute to identify the trends at the global 
scale corresponding either to a change (positive or negative) in 
vegetation amount or in a shift of the seasonality. 
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